Digitizing the Culicidae collection of Naturalis Biodiversity Center, with a special focus on the former Bonne-Wepster subcollection

Natural history collections contain a wealth of information on species diversity, distribution and ecology. However, due to historical and practical constraints, this valuable information is not always available to researchers. Our project aimed at unlocking data handwritten in notebooks owned by Johanna Bonne-Wepster, a Culicidae researcher. These handwritten notes refer to specimens labeled with a number only. The notebooks were scanned and entered into a Google spreadsheet. The specimens were provided with a unique identifier, labeled with the information from the notebooks and the data exported to the Global Biodiversity Information Facility. In addition, the type specimens were photographed. Besides Johanna Bonne-Wepster’s collection, mosquitoes from the former Rijksmuseum van Natuurlijk Historie collection and the former Zoölogisch Museum Amsterdam Nederland collection were digitized. All specimens are now housed at the Naturalis Biodiversity Center museum in Leiden. This paper describes the efforts to mobilize this data and the problems we encountered.

specimens. In the collection, the specimens were labeled only with a number. Due to the age of the field books, we feared losing the information they contained. Therefore, the field books were scanned, specimens were provided with labels and the data was entered in a database.
Before we started our project, we found that some mosquitoes with a Bonne-Wepster number were already provided with locality and species labels. Most CMGL mosquitoes are missing; therefore, these specimens could only be digitized as observation records (40,705).
In addition, 1,216 mosquitoes from the former Zoölogisch Museum Amsterdam Nederland (ZMAN) collection and 2,388 mosquitoes from the former RMNH collection were digitized as well. All specimens belong to the NBC collection (Table 1).

CONTEXT
Natural history collections are a rich data source that can be used for scientific research, education and the general public [1]. Traditionally, specimens were collected for private use in 'cabinets of curiosity' and subsequently adopted by museums, where they became the focal point of taxonomic research [2].
In recent years, natural history collections have embraced other fields of the biological sciences besides taxonomy. Despite sampling biases, natural history collections can be used to model current and past distributions [3], build molecular libraries [4], analyze biodiversity (i.e., for conservation purposes) [5], assess hybridization and speciation events [6], and predict the reemergence of diseases [7].
The natural history collection of NBC contains about 42 million objects sampled over the past 200 years [8]. A small part of this collection is formed by mosquitoes. Mosquitoes are among the most feared insects for their vector role in transmitting a wide array of pathogens, the most notorious being protozoans of the genus Plasmodium, the causative agents of malaria [9]. According to the World Health Organization, 619,000 people died of malaria in 2021 [10].
Most of the mosquito specimens now housed at NBC were owned and identified by Johanna Bonne-Wepster. The collection was used to study the morphological features that characterize mosquito species. The specimens were provided with labels having a handwritten number pinned below them ( Figure 1). These numbers were linked to information contained in eight different field books ( Figure 2). The data was organized in columns indicating the species name, a description of the collection place, the collector and, sometimes, short taxonomical notes ( Figure 3). As the field books of Johanna Bonne-Wepster are very old, we feared losing the data within and, consequently, the value of the specimens. Indeed, as beautifully stated by Lane [11], a specimen separated from its label has no scientific value:  'Together, a preserved organism and its label are a scientific specimen that has great intrinsic value. Separately, the label is a piece of paper with meaningless inscriptions upon it, and the plant, spider, microbe, mushroom, or bird, though carefully preserved, is just so much dead organic matter.' (Lane, 1996: 536) Hence, we strongly felt the need to 'rescue' the information in the field books and link this information to the specimens.   Dutch East Indies, she conducted thorough investigations about this family of blood-sucking insects. Her main goal was to give non-taxonomists the means to recognize vector species.
In addition, the couple contributed to the SEAMP, an international collaboration between American and Dutch militaries. This important research project lasted until after the In the seventies, her collection was transferred to the RMNH in Leiden (Table 1). Initially, the museum obtained only the sampled specimens, not the field books with detailed field notes. The field books appeared to be lost until a curator of the RMNH visited the elder Mrs. Bonne-Wepster and retrieved them. Mrs. Bonne-Wepster passed away in 1978.

Purpose
The main purpose of the project was to mobilize the data contained in the field books of Bonne-Wepster and corroborate it with the associated specimens. As we proceeded with the project, mosquitoes of the former ZMAN collection and the RMNH collection were digitized as well.

Sampling description
All specimens were carefully checked, investigated and provided with unique registration numbers ( Figure 4). This project produced 55,706 records. Among them, 52,102 records originated from the former Bonne-Wepster collection, including 40,705 missing CMGL specimens that were digitized as observation records. Additionally, 2,388 records pertain to mosquitoes from the former RMNH Culicidae collection, and 1,216 records refer to specimens from the former ZMAN Culicidae collection (Table 1).

Process
(1) First, we entered the data of the field books in a Google spreadsheet.
(2) Next, the field books were scanned and stored according to the Naturalis Archive protocol.
(3) The entered records were georeferenced using the Point Radius method, as described by Wieczorek et al. [18] (see below: Coordinates).
(4) Unique registration numbers with a corresponding QR code were added to the specimens. When the specimens were only labeled with a Bonne-Wepster number, additional labels were printed and added. Those labels were (a) a locality label, (b) a species name label and (c) an 'ex. coll. Bonne-Wepster' label ( Figure 4). accompanying the (holo)types were photographed using a Nikon camera D600 equipped with an AF Micro-Nikkor 60 mm f/2.8D lens. An overhead camera setup was used to photograph the labels from above.
(6) The Google spreadsheet was converted into a standardized sheet format that could be imported into the NBC database. Table 2 presents an overview of the specimen-specific information included in this sheet.

Coordinates
To indicate the coordinates, we used Google Earth and the Georeferencing calculator [19].
We used the point radius method technique. The point radius method delineates a locality via a pair of points and a distance, with the distance being a radius describing a circle around the points [18].
The field data of the Bonne-Wepster collection did not contain any coordinates.
Georeferencing an 'old' collection is challenging. The locality descriptions often lacked specificity. When the locality name was not specific enough (for instance, The Lawa River, Suriname) or unknown locality names were used, we did not assign any coordinates. The maximum length of the radius was set at 100 km.
Since the gathering area of this collection was widespread -from Sydney, Australia to Whitehorse, Canada, and from Transvaal, South Africa to Pampanga, Philippines -an indication of the minimum and maximum latitude and longitude was considered pointless.

Data validation
The field books were handwritten, so interpreting the locality and species names could be challenging. Some mosquitoes with a Bonne-Wepster number already had a locality and identification label. We always checked if the locality on the label coincided with that in the field book. When we found such a mismatch, we reported it in the section General Remarks  Taxonomic identification of the specimen.

Certainty
Certainty of the identification of the specimen. 'Uncertain' if the identification was doubtful. Name comments Additional comments regarding the identification of the specimen, such as specific information based on which the specimen was identified. Also, any discrepancies between the identification from the field books and the one on the pin were registered in this field.

TYPE status
If applicable, type-information of the specimen, verbatim. Additional comments regarding georeference information. E.g., remarks when a locality was not found or the margin of error proved was too high (>100 km).
To preserve the historical character of the collection as much as possible, we decided not to synonymize the species names. Therefore, the dataset is provided with the names as originally given by Bonne-Wepster. However, spelling mistakes were corrected using the taxonomic checklist Culicipedia [20]. The same checklist was used to interpret genus and species names, when they were abbreviated in the field books. This ensured that the names imported into the NBC database and ultimately exported to the Global Biodiversity Information Facility (GBIF) platform were error-free.
We never removed any labels under a specimen, and it was difficult to establish how discrepancies came about. Possibly, a mosquito expert later redetermined some specimens, or mistakes were made during the previous labeling process, but this is merely speculation.  Sometimes, the same number could refer to two specimens (e.g., no. 234, one specimen from the former CMGL collection or one from the former ITH collection). This could be a problem in establishing the collecting event. Working intensively with the specimens, we noticed a difference in how the two subcollections were labeled. The ITH collection had numbers written vertically ( Figure 6), while the CMGL collection had numbers written horizontally ( Figure 6).
The vast majority of the specimens were collected in Indonesia. Suriname ranks second in the number of collected specimens.

TAXONOMIC COVERAGE
The genera included in the database are listed in Table 3.

RE-USE POTENTIAL
The data made available through this project is valuable because it describes historical and recent records of mosquitoes and can be used in different research areas, including estimates of spatial distribution, modeling of current and future distributions through the Ecological Niche Modeling tool, systematics and vector control programs.
The species names attached to the records are the original ones and were not synonymized.

DATA AVAILABILITY
The dataset is available in the GBIF repository [21]. The original field books of Johanna Bonne-Wepster have all been digitized and are available in the Naturalis digital